Replace overall_safety_task#4081
Conversation
bfish713
left a comment
There was a problem hiding this comment.
I didn't thoroughly check the tests. Main comment is about how we spawn/cancel the timeout in the consistency_task
| result.insert(*view, leaf.1.clone()); | ||
| } | ||
|
|
||
| for (parent, child) in result.values().zip(result.values().skip(1)) { |
There was a problem hiding this comment.
nit: isn't this just leaf_pairs defined above
There was a problem hiding this comment.
I don't see a leaf_pairs in this scope, though I noticed the check was duplicated in a git merge (fixed)
| /// Handles an event from one of multiple receivers. | ||
| async fn handle_event(&mut self, (message, id): (Self::Event, usize)) -> Result<()> { | ||
| { | ||
| let mut timeout_task = spawn_timeout_task( |
There was a problem hiding this comment.
Does a timeout event here cancel timeout_task?
There was a problem hiding this comment.
Shouldn't we move this into the the if block just below, that way we are testing that we can get decides within the timeout (the event we care about)
There was a problem hiding this comment.
yeah timeouts (any event) cancels it, I did this intentionally at first to avoid having to manually increase the timeout on tests with a string of timeouts (e.g. staggered restart)
but now I think I agree with you that it should be moved. I'll make the change
| Ok(TestProgress::Incomplete) | ||
| } | ||
| } | ||
| pub async fn check_view_success(&self) -> Result<()> { |
There was a problem hiding this comment.
This checking that we didn't decide a view that we think should fail? can't we just check this super simply by looking at the leaf.view rather than getting it back out of our map
There was a problem hiding this comment.
yes you're right! this was just copy-paste
will fix
This PR:
overall_safety_task, replacing it with additional checks inconsistency_taskconsistency_taskruns checks after every decide event it receives, terminating the test if those checks failconsistency_tasknow times out if we take too long (currently 4 seconds) between external events (e.g. decides)num_failed_viewsis completely removedexpected_view_failures: vec![]possible_view_failures: vec![]This PR does not:
No checks are added to validate per-node view failures. These should be added in a future PR
Key places to review:
The added checks in
consistency_task.rs